Pesquisa | Portal Regional da BVS

scGMAAE: Gaussian mixture adversarial autoencoders for diversification analysis of scRNA-seq data.

Wang, Hai-Yun; Zhao, Jian-Ping; Zheng, Chun-Hou; Su, Yan-Sen.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36592058

RESUMO

The progress of single-cell RNA sequencing (scRNA-seq) has led to a large number of scRNA-seq data, which are widely used in biomedical research. The noise in the raw data and tens of thousands of genes pose a challenge to capture the real structure and effective information of scRNA-seq data. Most of the existing single-cell analysis methods assume that the low-dimensional embedding of the raw data belongs to a Gaussian distribution or a low-dimensional nonlinear space without any prior information, which limits the flexibility and controllability of the model to a great extent. In addition, many existing methods need high computational cost, which makes them difficult to be used to deal with large-scale datasets. Here, we design and develop a depth generation model named Gaussian mixture adversarial autoencoders (scGMAAE), assuming that the low-dimensional embedding of different types of cells follows different Gaussian distributions, integrating Bayesian variational inference and adversarial training, as to give the interpretable latent representation of complex data and discover the statistical distribution of different types of cells. The scGMAAE is provided with good controllability, interpretability and scalability. Therefore, it can process large-scale datasets in a short time and give competitive results. scGMAAE outperforms existing methods in several ways, including dimensionality reduction visualization, cell clustering, differential expression analysis and batch effect removal. Importantly, compared with most deep learning methods, scGMAAE requires less iterations to generate the best results.

Assuntos

Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Distribuição Normal , Teorema de Bayes , Análise de Célula Única/métodos , Análise por Conglomerados

scCNC: a method based on capsule network for clustering scRNA-seq data.

Wang, Hai-Yun; Zhao, Jian-Ping; Zheng, Chun-Hou; Su, Yan-Sen.

Bioinformatics ; 38(15): 3703-3709, 2022 08 02.

Artigo em Inglês | MEDLINE | ID: mdl-35699473

RESUMO

MOTIVATION: A large number of studies have shown that clustering is a crucial step in scRNA-seq analysis. Most existing methods are based on unsupervised learning without the prior exploitation of any domain knowledge, which does not utilize available gold-standard labels. When confronted by the high dimensionality and general dropout events of scRNA-seq data, purely unsupervised clustering methods may not produce biologically interpretable clusters, which complicate cell type assignment. RESULTS: In this article, we propose a semi-supervised clustering method based on a capsule network named scCNC that integrates domain knowledge into the clustering step. Significantly, we also propose a Semi-supervised Greedy Iterative Training method used to train the whole network. Experiments on some real scRNA-seq datasets show that scCNC can significantly improve clustering performance and facilitate downstream analyses. AVAILABILITY AND IMPLEMENTATION: The source code of scCNC is freely available at https://github.com/WHY-17/scCNC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Perfilação da Expressão Gênica , Análise de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados , Software

Promoter prediction in nannochloropsis based on densely connected convolutional neural networks.

Wei, Pi-Jing; Pang, Zhen-Zhen; Jiang, Lin-Jie; Tan, Da-Yu; Su, Yan-Sen; Zheng, Chun-Hou.

Methods ; 204: 38-46, 2022 08.

Artigo em Inglês | MEDLINE | ID: mdl-35367367

RESUMO

Promoter is a key DNA element located near the transcription start site, which regulates gene transcription by binding RNA polymerase. Thus, the identification of promoters is an important research field in synthetic biology. Nannochloropsis is an important unicellular industrial oleaginous microalgae, and at present, some studies have identified some promoters with specific functions by biological methods in Nannochloropsis, whereas few studies used computational methods. Here, we propose a method called DNPPro (DenseNet-Predict-Promoter) based on densely connected convolutional neural networks to predict the promoter of Nannochloropsis. First, we collected promoter sequences from six Nannochloropsis strains and removed 80% similarity using CD-HIT for each strain to yield a reliable set of positive datasets. Then, in order to construct a robust classifier, within-group scrambling method was used to generate negative dataset which overcomes the limitation of randomly selecting a non-promoter region from the same genome as a negative sample. Finally, we constructed a densely connected convolutional neural network, with the sequence one-hot encoding as the input. Compared with commonly used sequence processing methods, DNPPro can extract long sequence features to a greater extent. The cross-strain experiment on independent dataset verifies the generalization of our method. At the same time, T-SNE visualization analysis shows that our method can effectively distinguish promoters from non-promoters.

Assuntos

Redes Neurais de Computação , Biologia Sintética , Regiões Promotoras Genéticas

Extra Trees Method for Predicting LncRNA-Disease Association Based On Multi-Layer Graph Embedding Aggregation.

Wu, Qing-Wen; Cao, Rui-Fen; Xia, Jun-Feng; Ni, Jian-Cheng; Zheng, Chun-Hou; Su, Yan-Sen.

IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3171-3178, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34529571

RESUMO

Lots of experimental studies have revealed the significant associations between lncRNAs and diseases. Identifying accurate associations will provide a new perspective for disease therapy. Calculation-based methods have been developed to solve these problems, but these methods have some limitations. In this paper, we proposed an accurate method, named MLGCNET, to discover potential lncRNA-disease associations. Firstly, we reconstructed similarity networks for both lncRNAs and diseases using top k similar information, and constructed a lncRNA-disease heterogeneous network (LDN). Then, we applied Multi-Layer Graph Convolutional Network on LDN to obtain latent feature representations of nodes. Finally, the Extra Trees was used to calculate the probability of association between disease and lncRNA. The results of extensive 5-fold cross-validation experiments show that MLGCNET has superior prediction performance compared to the state-of-the-art methods. Case studies confirm the performance of our model on specific diseases. All the experiment results prove the effectiveness and practicality of MLGCNET in predicting potential lncRNA-disease associations.

Assuntos

Neoplasias , RNA Longo não Codificante , Humanos , Neoplasias/genética , RNA Longo não Codificante/genética , Biologia Computacional/métodos , Probabilidade , Algoritmos

scCDG: A Method Based on DAE and GCN for scRNA-Seq Data Analysis.

Wang, Hai-Yun; Zhao, Jian-Ping; Su, Yan-Sen; Zheng, Chun-Hou.

IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3685-3694, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34752401

RESUMO

Identifying cell types is one of the main goals of single-cell RNA sequencing (scRNA-seq) analysis, and clustering is a common method for this item. However, the massive amount of data and the excess noise level bring challenge for single cell clustering. To address this challenge, in this paper, we introduced a novel method named single-cell clustering based on denoising autoencoder and graph convolution network (scCDG), which consists of two core models. The first model is a denoising autoencoder (DAE) used to fit the data distribution for data denoising. The second model is a graph autoencoder using graph convolution network (GCN), which projects the data into a low-dimensional space (compressed) preserving topological structure information and feature information in scRNA-seq data simultaneously. Extensive analysis on seven real scRNA-seq datasets demonstrate that scCDG outperforms state-of-the-art methods in some research sub-fields, including single cell clustering, visualization of transcriptome landscape, and trajectory inference.

Assuntos

Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Análise por Conglomerados , Análise de Dados

GCAEMDA: Predicting miRNA-disease associations via graph convolutional autoencoder.

Li, Lei; Wang, Yu-Tian; Ji, Cun-Mei; Zheng, Chun-Hou; Ni, Jian-Cheng; Su, Yan-Sen.

PLoS Comput Biol ; 17(12): e1009655, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34890410

RESUMO

microRNAs (miRNAs) are small non-coding RNAs related to a number of complicated biological processes. A growing body of studies have suggested that miRNAs are closely associated with many human diseases. It is meaningful to consider disease-related miRNAs as potential biomarkers, which could greatly contribute to understanding the mechanisms of complex diseases and benefit the prevention, detection, diagnosis and treatment of extraordinary diseases. In this study, we presented a novel model named Graph Convolutional Autoencoder for miRNA-Disease Association Prediction (GCAEMDA). In the proposed model, we utilized miRNA-miRNA similarities, disease-disease similarities and verified miRNA-disease associations to construct a heterogeneous network, which is applied to learn the embeddings of miRNAs and diseases. In addition, we separately constructed miRNA-based and disease-based sub-networks. Combining the embeddings of miRNAs and diseases, graph convolutional autoencoder (GCAE) was utilized to calculate association scores of miRNA-disease on two sub-networks, respectively. Furthermore, we obtained final prediction scores between miRNAs and diseases by adopting an average ensemble way to integrate the prediction scores from two types of subnetworks. To indicate the accuracy of GCAEMDA, we applied different cross validation methods to evaluate our model whose performances were better than the state-of-the-art models. Case studies on a common human diseases were also implemented to prove the effectiveness of GCAEMDA. The results demonstrated that GCAEMDA was beneficial to infer potential associations of miRNA-disease.

Assuntos

Predisposição Genética para Doença/genética , MicroRNAs/genética , Modelos Genéticos , Redes Neurais de Computação , Algoritmos , Área Sob a Curva , Biologia Computacional/métodos , Humanos , MicroRNAs/metabolismo , Neoplasias/genética , Neoplasias/metabolismo

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA